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Abstract 

Waterflooding is among the oldest and perhaps the most 
economical of oil recovery processes to extend field life and 
increase ultimate oil recovery from naturally depleting 
reservoirs. During waterflood operations, water is injected 
into the reservoir to maintain a certain reservoir pressure as 
well as to push the oil in the reservoir towards the 
producing wells. Nowadays, any organization always has to 
strive for lean and efficient technologies and processes to 
maximize profit also when looking deeper into their 
reservoir portfolios in order to identify additional 
waterflooding opportunities. Time and information 
constraints can limit the depth and rigor of such a screening 
evaluation. Time is reflected by the effort of screening a vast 
number of reservoirs for the applicability of implementing a 
waterflood, whereas information is reflected by the 
availability and quality of data (consistency of measured and 
modeled data with the inherent rules of a petroleum system) 
with which to extract significant knowledge necessary to 
make good development decisions. 

A new approach to screening a large number of reservoirs 
uses a wide variety of input information and satisfies a 
number of constraints such as physical, financial, 
geopolitical, and human constraints. In a fully stochastic 
workflow that includes stochastic back-population of 
incomplete datasets, stochastic proxy models over time 
series, and stochastic ranking methods using Bayesian belief 
networks, more than 1,500 reservoirs were screened for 
additional recovery potential with waterflooding operations. 
The objective of the screening process is to reduce the 
number of reservoirs by one order of magnitude to about 
100 potential candidates that are suitable for a more detailed 
evaluation. Numerical models were used to create response 
surfaces as surrogate reservoir models that capture the 
sensitivity and uncertainty of the influencing input 
parameters on the output. Reservoir uncertainties were 
combined with expert knowledge and environmental 
variables and were used as proxy model states in the 


formulation of objective functions. The input parameters 
were initiated and processed in a stochastic manner 
throughout the presented work. The output is represented 
by a ranking of potential waterflood candidates. 

The benefit of this approach is the inclusion of a wide range 
of influencing parameters while at the same time speeding 
up the screening process without jeopardizing the quality of 
the results. 
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Introduction 

Chevron Nigeria Ltd. is successfully applying 
waterflooding to increase oil recovery in a limited 
number of reservoirs. Current waterfloods represent 
only -25% of the risked oil-in-place volumes and -3% 
of reservoirs. Field development plans, which estimate 
the ultimate recovery of each field, recognize that 
significant contingent resources associated with 
waterflooding may exist in over 1,500 reservoirs under 
primary depletion. A screening exercise over the entire 
reservoir portfolio was conducted to identify potential 
waterflood candidate reservoirs. The main challenges 
of screening exercises are usually the tight time frame 
that is imposed on the schedule and the limited 
manpower available as well as the recognition and 
management of uncertainties due to lack of data. A 
large number of reservoirs require a phased approach, 
with the analysis detail increasing in each phase to 
address the decision (FIG. 1). 

The objective of Phase 1 is the evaluation of all 
reservoirs in the portfolio using only reservoir and 
field-level data to develop a ranked list of about 100 
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candidate reservoirs for waterflooding. As an 
important part of the decision-making process, the 
ranking exercise should integrate not only the 
reservoir's resource potential, but also environmental 
and operational considerations such as security, 
existing infrastructure, and economic feasibility. 


Reservoir amount Data amount 



FIG. 1 SKETCH OF THE THREE-PHASED SCREENING 
APPROACH TO IDENTIFY RESERVOIR CANDIDATES FOR 
WATERFLOOD IMPLEMENTATION DEPICTING THE AMOUNT 
OF DATA AND RESOLUTION REQUIRED FOR EACH PHASE 

Subsequent Phases will further reduce the number of 
potential reservoirs by one order of magnitude. Spatial 
and time-dependent data at the well level are included 
in a Phase 2 and analyzed with methods that provide 
appropriate accuracy in their prediction. Phase 3 
evaluates data at the completion level and applies 
rigorous field development planning techniques to 
deliver probabilistic resource assessments, reservoir 
management strategies, and depletion plans. 
Operational considerations will remain as ranking 
factors through all three phases. Both the operational 
and reservoir assessments will provide data for 
investment decisions. 

This paper describes Phase 1 of the screening exercise. 

Workflow Overview 

The workflow for the screening study consists of four 
main components. Firstly, the entire dataset is 
surveyed, and key parameters for the classification of 
reservoir types are identified. If data are obviously not 
consistent or missing, they are flagged to indicate 
potential problems. Severe conditions may exist for a 
database in which the amount of missing or 
inconsistent data is so high that a screening study is 
rendered incapable of delivering sensible results. As it 
can be seen in FIG. 2, which depicts the completeness 
of the study database, the viscosity value for oil is 
available for only fewer than 20% of all reservoirs. 
However, since the viscosity is one of the main 
parameters influencing the waterflood (Craig, 1971), a 
qualitative statement on the viability of water injection 
for more than 80% of all reservoirs is not possible. To 
overcome this database deficiency, the dataset is back- 


populated using multi-variant correlation techniques. 
In the second step, statistical analysis of clustered 
parameters and the identification of the inherent error 
of each data member are used to transfer both original 
and back-populated data to a probabilistic database, in 
which each data entry contains a mean value and a 
standard deviation. The completed dataset is mined 
for similarities using self-organizing maps (SOM) to 
classify all reservoirs into clusters based on the 
nonobvious and intrinsic properties of their key 
parameters. 

The third step involves the construction of stochastic 
proxy models, response surface models (RSM) which 
are used to formulate the objective function, the 
difference in oil recovery between a development with 
and without water injection, and which allow 
qualitative conclusions on the recovery factor for each 
reservoir in both cases. Furthermore, environmental 
considerations and expert knowledge are captured in 
proxy models that define the economic and physical 
viability for waterflooding for individual reservoirs. 
The stochastic output of the viability proxies is used as 
states for the Bayesian belief network (BBN) in the 
fourth step and is combined with the differential 
recovery proxy to construct the objective function. The 
Bayesian network is used to calculate the joint 
probabilities for all proxies feeding into the objective 
function to evaluate the applicability for waterflood 
implementation for a specific reservoir. Based on the 
resulting probabilities, all reservoirs are ranked for the 
potential of success of waterflood implementation. 

Back-population of Data 

For the purposes of this paper, data can be divided 
into two categories: the base and derived parameters. 
The former is the observed data as measured in 
laboratories and in the field. The latter can be either 
calculated from the former or reflect states or 
conditions (such as "if-else" statements). It can be 
assumed that a relationship between some or all data 
exists in one way or the other. Correctable data might 
be linked through empirical relationships like 
pressure/volume/temperature (PVT) data or are 
correlatable through physical or mathematical 
processes such as the sweep efficiency (i.e. how much 
oil is replaced by the injected water), recovery, and oil 
viscosity. There also are inherent relationships that can 
be observed. Examples of those are the linear 
relationship of temperature versus depth, the 
logarithmic relationship between permeability (and 
linear to porosity) versus depth, and a power law 
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relationship of reservoir size with depth (assuming 
that deeper formations were exposed longer to 
tectonic forces than shallower formations). However, 
nonlinear, multilayered, parallel regression techniques 
have been used in this paper to define the various 
more complex and less obvious relationships. The 
advantage of these methods is that all data can be 
considered simultaneously so that potentially even 
hidden or unobvious relationships are found (Pyle, 
1999, Zangl, 2003). Moreover, a vast amount of data 
can be looked at in a minimum of time. 

Probabilistic Databases 

The introduction of statistical methods and 
uncertainty management in the workflow allows 
overcoming deficiencies related to incomplete datasets 
and guarantees meaningful screening over entire 
datasets (Graf et al. 2008). Rather than representing 
one world in deterministic databases, probabilistic 
databases capture multiple worlds and deliver a 
probabilistic answer to queries. The probabilistic 
appearance of the data members — in its simplest form 
an uncertainty distribution— allows managing the 
imprecision of data such as nonmatching data values, 
imprecise queries, inconsistent data, misaligned 
schemas, etc. (Dalvi and Suciu 2005). The statistical 
analysis of the relation of each data member and its 
error to expected and calculated value(s), or inherent 
errors such as from the measurement, can derive the 
distribution and the validity range for each data 
member. Incomplete data can be back-populated 
relationally with a distribution describing the amount 
of and the confidence in the available data for a 
specific data member. Moreover, relational data can be 
used to define the predictability of the data and 
introduce the inherent error. The database would then 
contain only probabilistic data. Measured data would 
normally have a higher confidence and back- 
populated data would have a larger uncertainty range 
(or standard deviation). 

Screening of Probabilistic Data 

An operation or the application of an algorithm on the 
probabilistic database results in a stochastic output, 
whose distributions reflect the uncertainties of the 
input parameters captured by the database. In contrast 
to screening using deterministic data, which are 
unambiguous in the ranking exercise, the screening of 
probabilistic data must consider the entire distribution 
and, hence the decision on the ranking must consider a 
multitude of factors such as mean values, the 


confidence levels, and the delimiting conditions. A 
probabilistic reasoning method deploying BBNs can 
be used as a screening algorithm to compute the 
probability of success for a particular objective 
function. 

Database Construction and Reconstruction 

The project database contained more than 100 
different data members, some of which are depicted in 
FIG. 2. However, initially only a subset of about 12 
parameters were considered relevant for estimating 
the benefit of water injection, and these were generally 
focused around in-place volumes and reservoir rock 
and fluid properties. 

The overall completeness of the database was about 40% 
(with the permeability, for example, at about 5% and 
the initial reservoir pressure at about 87%), and 
dataset consistency was only 6% over all reservoirs. A 
successful ranking exercise would have not been 
possible under the initial condition of the database. 

Back-population of Missing Data 

To correlate as many characteristics (parameters) of a 
given reservoir and, consequently, increase the quality 
and confidence of the conclusions for reservoir 
classification, the data gaps must be filled. The gap- 
filling methodology applied here involves 

multidimensional cross-correlation using SOMs. 

The benefit of using this methodology lies in the 
multi-variant approach, which is much stronger and 
less error-prone than non-linear trending 

methodologies usually applied for these purposes 
(Zangl 2003). Provided the multi-dimensional 
relationship is inherent in the data and the quality of 
the data has been increased through outlier removal, 
the SOM is able to estimate a value for a missing data 
point by learning the relationships of the parameters 
amongst each other. 

The SOM models in this case are used as a regression 
tool to compute missing values based on the available 
values of a certain measurement and based on the 
identified correlations among the parameters. The 
SOM hereby computes a model that describes the 
measurements in a data cloud in an optimal way. For 
this purpose it creates a smaller, virtual data cloud 
with the same dimensions (measurement parameters) 
as the observations and tries to fit this virtual data 
cloud to the observed data cloud by placing the virtual 
nodes as close as possible to the measured nodes. The 
measure of proximity hereby is the Euclidean distance 
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FIG. 2 EXAMPLES OF DATA AND THEIR COMPLETENESS OVER 
THE ENTIRE RESERVOIR PORTFOLIO 

(Hair, 1998). An optimization algorithm distributes the 
observation nodes in the multi-dimensional space, so 
that the sum of Euclidean distances to the 
measurement nodes is minimized; which corresponds 
to the best fit of virtual to observation cloud. By 
matching the virtual with the observed data cloud, 
each measurement is associated with a virtual node, 
where many observation nodes can share a common 
observation node. After the analysis of global and 
local errors and considering them acceptable, the 
matched virtual data cloud generated by the SOM is a 
reasonably valid virtual representation of the 
observation cloud and can be used to predict missing 
values of incomplete measurements. In order to 
achieve this, the incomplete measurement is placed 
into the virtual data cloud as close as possible to the 
virtual node that is the most likely to be similar to the 
incomplete measurement (according to the Euclidean 
distance calculated from the available values). It can 
now reasonably be assumed, that the incomplete 
measurement is similar to the virtual node that has 
been assigned to; hence it can be derived that the 
missing values in the measurement are equal to the 
values of the same parameters of the virtual node. 

In the presented workflow, the back population 
workflow is used in a slightly advanced way, by not 
just applying the deterministic values of the closest 
virtual node to the gaps, but by looking at all adjacent. 


also similar nodes (same cluster), and back-populate 
the missing value as a stochastic representation 
through a normally distributed probability function 
defined by the mean and standard deviation of the 
obtained values. This rigorous consideration of 
measurement and model uncertainties helps to 
improve the result of the investigation and increases 
the unbiased character of this analysis. The limitations 
of this approach is that the back-population procedure, 
as applied here, can only be used with a reasonable 
degree of accuracy when the available values of an 
incomplete measurements are sufficient to confidently 
place them in the virtual data cloud. If that is the case, 
the overall accuracy of the model is not impaired by 
introducing the incomplete measurements. 

Completeness and Consistency Check 

The methodologies applied in the back-population 
process require the datasets to adhere to a certain 
consistency, which has a direct impact on the degree 
of completeness. Measurement gaps in datasets can be 
filled using regression approaches as described in this 
paper. However, the consistency with the natural rules 
and concepts of a petroleum reservoir has to be 
maintained at all times in the process to guarantee the 
reliability of the results. Therefore, only those datasets 
can be back-populated with a sufficiently high degree 
of certainty, which have enough measurements 
available (hence only few degrees of freedom of the 
back-population approach) and therefore allowing to 
obtain a more precise estimate for the missing value(s). 
Analyzing the completeness aids in the assessment of 
those parameters that can be used for further analysis 
and those that need be back-populated to be included 
in the workflow. 

Although the initial reservoir pressure is the most 
complete of all available reservoir parameters, for 
more than 15% of the reservoirs the (initial) pressure is 
unknown. In, at least, 90% of the reservoirs some 
factors like average permeability and oil 
compressibility are unknown. FIG. 2 shows the 
completeness of zero-dimensional data from Chevron 
Nigeria Ltd.'s entire reservoir portfolio and illustrates 
the variability in the completeness of reservoir 
parameters available for study. 

Parameters like permeability, compressibility, and oil 
viscosity show the lowest degree of completeness, 
thus representing the limiting parameters for the 
reservoir classification and the back-population 
process. In the following combinatorial analysis, the 
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FIG. 3 STEP-WISE SOMS FOR BACK-POPULATION WITH THE COMPLETENESS LEVEL AT THE BOTTOM (BACK-POPULATED DATA ARE 

ORANGE) 


combinations of parameters are tested with regards to 
the number of complete patterns they can yield using 
back-population to estimate the missing values, while 
still obeying the rules of consistency. 

As an example, varying combinations of the identified 
limiting parameters with a maximum number of six 
parameters increase the number of complete patterns 
from 69 to 1,110 (4 to 65%), depending on the original 
completeness of the parameters. Obviously, those 
parameters with a higher completeness, as initial 
pressure estimated (completeness of 86.7%) or oil 
depth (79.5%), yield a substantially higher number of 
complete patterns than combinations with oil 
compressibility (9.4%) or oil viscosity (18.5%). 

The consistency analysis thus guides the back- 
population process in the setup of proxy models, 
because they identify those combinations of 
parameters that yield the highest overall completeness 
of the dataset. 

SOM Proxies 

The combinatorial analysis has yielded eight 
combinations of parameters (minimum of four. 


maximum of six parameters) which are used as SOM 
proxy models to consistently back-populate individual 
parameters (Kohonen, 1990, 1997, Hair, 1998). FIG. 3 
shows the proxy models. Each relates a set of input 
parameters (in yellow) to one— or in one specific case, 
two— output parameters (in light blue), where the 
output presents the estimation of the SOM back- 
population process. 

In the case of the limiting parameters permeability, 
compressibility, and oil viscosity, it can be seen that 
the gain is the largest (between 43 and 62%), thus 
contributing to an overall gain of 13.5% by 
combination of 16 parameters. 

This gain might not seem substantial but it unlocks a 
much greater number of patterns (1,120), combined 
with a larger number of parameters (12), thus 
increasing the quality of the subsequent reservoir 
classification. 

Blind Test and Validation 

To prevent SOM proxy outcomes from being 
influenced by observer bias or to verify the proxy 
models on their stability and predictability a blind test 
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is performed after each back-population. The blind test 
is used to identify the error that is induced during the 
back-population process. This error is inversely 
proportional to the quality of the SOM proxy and is 
therefore directly reflected in the overall error of the 
back-populated dataset. 



FIG. 4 CROSSPLOT OF ORIGINAL VERSUS RECALL OF 
RESERVOIR TEMPERATURE FOR BLIND TEST OF SOM PROXY 
#1 (RMS=0.965) WITH TABLE DEPICTING EXAMPLES OF 
RESERVOIR MEMBERS WITH THEIR RECALL ERROR 

The blind test is performed by removing 5% of all 
complete patterns. A SOM is subsequently trained on 
the remaining patterns, after which a recalculation is 
performed to back-populate the removed values. By 
comparison of the back-populated values with the 
removed values the error should not exceed 5% (see 
FIG. 4) as this would be an unsuitable SOM proxy and 
not acceptable for back-population (Zangl 2003). 
Additionally, the values for the blind test are evenly 
distributed across the entire parameter range to assess 
the validity over the entire SOM proxy space / range. 

SOM Proxy Quality and Completeness Ratio 

The previously defined blind test error is used to 
define the error ranges for each individual data 
member— regardless if observed or SOM proxy back- 
populated data. This error is considered for each value 
in the database to contribute to the probabilistic 
character of the data members and is referred to as the 
SOM quality. The error is used to weight the standard 
deviation of the data range for each cluster. 

Additionally to the SOM quality, the completeness 
ratio is introduced. The completeness ratio is a 
measure of the degree of back-population carried out 


for an individual parameter. It is derived by 
calculating the ratio of the initial to the final number of 
records. As described earlier, permeability, 
compressibility, and viscosity show the lowest values 
for completeness ratio, as these parameters exhibit the 
lowest completeness. The SOM quality ranges from 
the blind tests for the individual parameters show a 
distribution from 0.47 to 6.39%, which can be 
attributed to the removal of outliers in the original 
dataset. 

Reservoir classification 

It is apparent that the reconstructed or in fact any 
database in its entirety cannot be used for an effective 
(i.e., rapid) screening exercise that uses a sophisticated 
algorithm and derives a valid objective function for all 
data members at the same time. The screening 
algorithm works most accurately, when it is applied to 
similar data members distinguishing between the 
applied processes where, for example, in this case 
different reservoir mechanisms prevail. For example, it 
can be assumed that the screening algorithm for 
identifying the benefits of water injection will work 
differently for oil reservoirs with and without gas caps. 
FIG. 5 depicts the SOM clustering of all reservoirs over 
12 parameters. The seventh SOM plot for example, 
which shows the gas cap factor m, groups gas-cap 
reservoirs in separate clusters like "islands" in the blue, 
non-gas-cap "sea." A screening algorithm dedicated 
to the clusters rather than the entire database will 
yield more accurate results. 

SOM clustering has significantly increased the focus 
on the data amount from over 1500 individual 
reservoirs to 17 groups of reservoirs with comparable 
properties and possibly with similar behavior. Each of 
these 17 groups is clearly defined by particular 
features that vary significantly from one group to the 
other (e.g. significantly different viscosities, porosities, 
etc.); but vary only slightly for the reservoirs within a 
particular group (Kohonen, 1990, 1997, Hair, 1998). 
FIG. 6 shows the property distribution and the mean 
value of four example properties for each of the 
identified clusters. 

Conversion to a Probabilistic Database 

Each cluster can be statistically analyzed for its 
property distributions assuming a normally 
distributed set of data to derive mean values and 
standard deviations. FIG. 7 shows the statistical 
indicators of the various properties for some clusters. 
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FIG. 6 PVT PROPERTY DISTRIBUTION WITH MINIMUM, MEDIAN AND MAXIMUM VALUES FOR THE 17 CLUSTERS 


18 



Statistics Research Letters (SRL) Volume 2 Issue 1, February 2013 


www.srl-journal.org 


PVT stands for the "pressure", "volume", 
"temperature" relationship. To avoid physical 
impossibility or incorrect values during the screening 
exercise, minima and maxima - the lowest and highest 
values observed in a particular cluster - must delimit 
the bell-shaped distributions. 

The cluster distributions for a parameter are applied to 
the individual data members by overlaying the 
standard deviation of the cluster onto the database 
value. The SOM proxy error from the blind test is used 
to modify the standard deviation. Each individual 
data member contains then a mean value, an 
individual standard deviation, and the physical upper 
and lower limits from the cluster. 

The Screening Engine 

For the purpose of this paper, the definition of a 
screening engine is an algorithm that captures a 
process or method to arrive from the input signal to 
the output signal. The screening engine could capture 
any kind of complexity of the process. A mathematical 
operation on the database would be a fairly basic 
engine to derive an objective function for screening, 
but the engine could also be as complex as deriving 
the possible oil recovery and production rate of 
reservoirs. Moreover, since the database is 
probabilistic, the screening engine must run in a 
Monte-Carlo-type simulation to construct the output 
distribution. Considering the extreme amount of 
calculation necessary to define the output, numerical 
screening engines— as in this case for the recovery 
definition— are impractical. However, proxy models 
that capture the input-output relationship of the 
objective function are feasible. 


The challenge for a stochastic computation like the one 
described here is to compute complex numerical 
reservoir simulation models in a fraction of time of a 
conventional numerical engine in order to allow for 
many different realizations in a reasonable time. 
Reservoir simulation is usually based on the finite 
difference - finite volume approach to efficiently 
handle the multi-phase flow in porous media using 
tens to hundreds of thousands grid blocks. The 
screening engine hence needs to take advantage of a 
suitable proxy that can execute fast enough to allow 
for a Monte Carlo Analysis with several thousand runs 
for over 1,500 reservoirs. The proxy selected in the 
presented work is based on a surrogate reservoir 
model. Surrogate reservoir models yield similar 
results as an actual numerical simulation model of a 
reservoir while running in a fraction of a second 
(Mohagegh et al. 2006). In contrast to a reservoir 
simulation engine, which takes into account the laws 
of physics and fluid flow in porous media, a surrogate 
model as a proxy model is typically merely "calibrated" 
on a small sample of input-output data sets from 
previous simulation runs. The sample datasets should 
ideally be very small while still containing enough 
scenarios to cover the whole space of possible input 
and output as described in several experimental 
design (ED) strategies (Box, 1960; Damsleth et al. 1991; 
Dejean and Blanc 1999). After being calibrated on 
these samples the SRM based proxy has the capability 
of mimicking the simulation model on the full range 
even though it does not have to go through the 
complex computation of a numerical engine. Hence, 
the execution time of a surrogate model is typically in 
the range of a fraction of a second and it therefore can 
be considered a very suitable approach to run multiple 
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FIG. 8 MULTI-RESPONSE PLOT FOR THE OIL RECOVERY OF CLUSTER 4 WITH WATER INJECTION 


thousand reservoir simulation runs in a stochastic 
layout. 

The aim of the SRM in this work is to approximate a 
process by a simple regression model that fits the true 
response with a sufficient degree of accuracy and 
which will hence act as the quick response surface. 
The response surface approximates the process by 
regressing on the executed experiments which were 
executed using ED. A large variety of proxy models 
exist in what can train on the history of an input- 
output relationship and predict for unknown output 
responses, or the output response of an objective 
function on input data alone, or use a combination of 
both (Zangl et al. 2006). It must be noted that the 
response surface based on a proxy model is only an 
approximation and is not as accurate as the process 
itself— but, neither is the data members in a 
probabilistic database. It can be assumed that the 
simulation of the proxy model over the entire input 
spectrum will derive an output distribution that 
eventually covers all inherent errors and uncertainties. 
Moreover, it is the speed and not the exact solution 
that is required for the screening exercise. 

Recovery Proxy 

The most important objective function to address the 
impact of the waterflood on a reservoir is the recovery 


factor. The difference between natural depletion and 
water injection can be determined through the 
recovery factor to allow a qualitative statement on the 
benefits of secondary recoveries for individual 
reservoirs. 

A simple two-dimensional, slanted numerical model 
has been constructed that represents a cross-section 
through a generic-type reservoir. A production well is 
located in the attic in the case of an oil reservoir and 
half-way in-between the contacts in a gas cap reservoir. 
One injection well is located directly in the aquifer to 
support for the waterflood experiments. The models 
are populated with uncertain rock and fluid properties 
defined from the clustering exercise. Inherent 
uncertainties such as aquifer strength and 
heterogeneity are also included. 

ED is used to effectively simulate all possible 
combinations of the uncertainty parameters. The 
combination of stochastic and deterministic 
uncertainties requires about 600 experiments to be 
simulated for each recovery strategy and each cluster 
(FIG. 8) amounting to about 20,000 simulation runs to 
construct the necessary output space for the RSM. 

The RSMs have been used on the database to define 
the benefits of waterflooding according to the cluster 
properties and their associated uncertainties. A Monte- 
Carlo simulation is performed on each data member 
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considering its distribution from the probabilistic 
database as constructed with the help of the clusters. 
The output distribution— the difference between the 
recovery with and without water injection— is a 
qualitative statement on the benefits of waterflooding. 
The output distribution is then discretized for the use 
in the BBN screening process. 

Candidate Screening 

The candidate screening is performed using an expert 
system based on a BBN. The BBN is applied to 
describe and reconstruct a complex decision process 
involving multiple parameters, adding the notion of 
uncertainty (Neapolitan 2004). The various input 
parameters in a BBN can either be conditionally 
independent— not having any influence on each 
other— or conditionally dependent. In the latter case, 
prior knowledge of the dependency of the various 
parameters must be quantified and entered as a 
conditional probability relation in the BBN; this is 
done either by using a Bayesian learning approach or 
manually through an expert (Mitchell 1997; Korb and 
Nicholson 2004). In this work, a BBN was set up 
purely by experts to consistently reproduce their 
reasoning process on a large number of samples 
(reservoirs), considering the various aspects of their 
decision such as economic, logistic, and reservoir 
considerations. The outcome is a score between 0 and 
100 that describes the reservoir's applicability for 
waterflood recovery. 

The input parameters in a BBN can either be 
continuously measured variables or discrete variables. 
In the given project, some of the input variables are 
entered considering uncertainty in the measurements, 
computation, or in the reasoning process, such as the 
recovery factor as discussed in the previous section. By 
discretizing the stochastic input into a limited number 
of categories called "states," the stochastic properties 
of the input can be fully considered while maintaining 
a computationally cheap BBN. The result of the BBN is 
a decision score, which represents the likelihood that a 
given reservoir is a good candidate for secondary 
recovery, fully considering the inherent uncertainty of 
the variable measurements and computations as well 
as the expert's decision process. In order to compute 
the score, the BBN processes the input distributions in 
the various nodes and their conditional interaction 
according to the expert logic about conditional 
probability applying Bayes' theorem. Prior 
probabilities (expert knowledge) are multiplied with 


the input probabilities (observations or computations 
from proxy model) and normalized to obtain 
probability values from zero to one, which is from 
impossible to definite outcome given by the 
observation and expert knowledge, respectively. 

There are three different decision criteria leading to 
the final decision on whether a reservoir is a viable 
candidate for a waterflood operation or not. The three 
reasoning trains are conditionally independent and 
hence information from one process does not influence 
any other decision criteria, which is why experts from 
various disciplines can contribute to the reasoning 
system and benefit from a holistic expert system. A 
graphical representation of the BBN is given in FIG. 9. 
In the depiction, the gray boxes denote input variables, 
the white boxes decision points, and the arrows the 
parameter dependencies (Jensen 2001). The criteria are 

1. Economic viability: This decision criterion takes 
into account the estimated recoverable 
hydrocarbon reserves that can be exploited using 
the same infrastructure (i.e., stacked reservoirs), 
the possible proximity to waterflooding 
infrastructure (e.g. from other fields), logistical 
issues (onshore, offshore), and the operational 
security. 

2. Physical viability: The physical viability criterion 
looks at the suitability of a reservoir for a 
waterflood regarding its reservoir and aquifer 
properties. 

3. Potential delta recovery: The third reasoning 
process processes the outcomes from the 
calculation of the proxy models. The objective is to 
quantitatively describe the potential additional 
recovery that can be realized by secondary 
recovery. 

Economic Viability 

An important aspect of the ranking procedure is the 
economic viability of a potential secondary recovery 
project. In contrast to other technical reasoning 
systems that focus on incremental recovery and/or 
incremental net-present-value (NPV), the given expert 
system takes into account factors such as the proximity 
to other waterflood installations of existing waterflood 
operations in other fields, the logistics, and the 
operational aspects. 

The stacked oil-initially-in-place (OIIP) in combination 
with the possible proximity to other waterflood 
installations gives the initial estimate of whether a 
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given reservoir is a good candidate to initiate 
waterflood operations. The higher the OIIP, the less 
important is the proximity to other waterflood 
infrastructure as a high OIIP would certainly justify 
the investments in necessary infrastructure. Once the 
volumetric and infrastructural decision has been made, 
the expert system investigates the operational security 
aspect as well as the geographical logistics. Some of 
the fields are offshore or in swamp areas, which— even 
though volumetrically the field might be of significant 
value— downgrades the score of this particular field. 
Also, security concerns are addressed, which is why 
the expert system takes into account whether the given 
field is in an area that can be considered as a well- 
controlled and high-security area. If this were not the 
case, a field would be ranked lower in the candidate 
ranking process even though many other factors might 
suggest incremental benefits (or value). 

Physical Viability 

Based on available information or estimations of 
reservoir and aquifer properties, the drive energy of 
the candidate reservoir is assessed. Reservoirs with a 
high initial reservoir pressure and a weak aquifer are 
preferred as these reservoirs generally show the most 
benefits from waterflood operations. 

Production enhancement potential as well as 
displacement and sweep considerations are 
incorporated in the hydrocarbon delivery potential 
decision. The expert system favors reservoirs with a 
thick oil rim and hence high production enhancement 
potential. 

Reservoir Ranking 

The final decision node combines the finding from the 
economic viability decision, with the physical viability 
decision, and the determined delta recovery and 
integrates them into a final conclusion. 

By introducing the three independent decision 
processes, with independent inputs, the formulation 
and quantification of the reasoning logic for the final 
decision becomes less complex. Each of the decision 
process branches can be considered individually and 
hence not all the inputs are fed into the same decision 
logic. The three independent decision process 
branches require a much less complex reasoning logic 
entered by the expert, compared to a single decision 
logic description that considers every parameter at the 
same time. The branches are processed parallelly and 
separately; only in the last decision, the score, is the 


three branches combined to a final decision. This 
approach significantly facilitates the expert team that 
sets up the reasoning logic because the number of 
possible combinations of input variable states is 
reduced significantly. 

The final decision is represented as a score between 0 
and 100 points, with 0 points signifying a poor 
candidate and 100 points identifying a very promising 
and suitable candidate for waterflooding. The 
influence of the various process trains is weighted 
differently by the expert team according to the 
importance of each train in the decision process. 
Economic viability has the biggest impact, because a 
field with good reservoir and fluid properties and a 
good delta recovery is of limited interest if it is in a 
low-security area and/or in a swamp area. Physical 
viability and delta recovery are almost equally 
weighted; and the former has a slightly higher impact 
on the results of the final decision to compensate for 
potential uncertainties arising from the proxy model 
calculation. 

The final decision is represented as a score between 0 
and 100 points, with 0 points signifying a poor 
candidate and 100 points identifying a very promising 
and suitable candidate for waterflooding. The 
influence of the various process trains is weighted 
differently by the expert team according to the 
importance of each train in the decision process. 
Economic viability has the biggest impact, because a 
field with good reservoir and fluid properties and a 
good delta recovery is of limited interest if it is in a 
low-security area and/or in a swamp area. Physical 
viability and delta recovery are almost equally 
weighted; and the former has a slightly higher impact 
on the results of the final decision to compensate for 
potential uncertainties arising from the proxy model 
calculation. 

Each candidate reservoir is scored according to this 
reasoning system. The results are sorted according to 
the likelihood whether a particular candidate is a good 
candidate for waterflood operations (represented by 
the score). The results from the individual decision 
processes are stored along with the final result for 
every candidate reservoir to be able to investigate the 
reasons for a particular final score and to possibly 
manually adjust individual scores (e.g., a bad score 
might be due to a lack of available waterflood 
infrastructure; however, good reservoir properties and 
high estimated incremental recovery could make the 
reservoir still a good candidate for a waterflood). 
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The final ranking is then checked by the entire 
integrated team to agree on a final short list of 
candidate reservoirs for secondary recovery. 

Validation of Reservoir Ranking 

In an effort to understand the validity of the results of 
the stochastic screening exercise, the ranked list of 
candidate reservoirs was compared to a previously 
generated ranking list from standard deterministic 
screening exercises based on calculated key 
performance indicators (KPIs) and subjective 
estimations of reservoir behaviors from the data at 
hand. This screening effort used the identified 
characteristics of successful waterflood reservoirs as 
search criteria to recognize candidate reservoirs for 
new waterfloods. 

Although both methods used the same input 
parameters in their objective function, the main 
difference is in the introduction of the confidence in 
the data and the stochastic character of the database. 
This allows ranking technical key indicators without 


bias, whereas deterministic screening and expert- 
guided rankings are subjective. The two screening 
efforts proved to be complementary, and a composite 
list of candidate reservoirs was identified for the next 
phase of evaluation. Candidate reservoirs identified in 
the Phase 1 screenings were validated in workshops 
with each of the asset teams. Overall, the Phase 1 
screening results aligned well with asset team 
perceptions of additional waterflood potential. Asset 
team feedback recommended only 9 reservoirs to be 
dropped away and 4 reservoirs to be added to the 
original candidate list of 104 reservoirs, resulting in a 
final list of 99 waterflood candidate reservoirs. 

Additionally, the effort of benchmarking the resulting 
reservoirs from the stochastic screening with the 
reservoirs currently under waterflood proved to be 
consistent, thus underlining the applied methodology. 
Also, before the study was conducted, only 25 
reservoirs in the reserve system were recognized as 
waterflood candidates with considerable secondary 
recovery potential, indicating there should be 
significant potential for resource and reserve additions. 
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Conclusions 

To the authors' knowledge, this is the first attempt to 
create a probabilistic database using a stochastic, 
SOM-based back-population algorithm. Furthermore, 
it has been shown that SOMs can be used to describe 
the confidence in the existing and reconstructed data 
for input into a probabilistic database without bias and 
based only on statistical analysis of the data members. 
A screening engine was constructed— the recovery 
proxy— that used the database to define key indicators 
for input into a BBN-based ranking exercise. Although 
the screening process itself contains a significant 
number of procedures and algorithms, once they are 
identified for a particular problem, the ranking can be 
applied rapidly to similar datasets. In fact, the 
engine— the recovery proxy in this case— can be 
changed to any querying proxy model algorithm 
allowing the application of this workflow to any KPI- 
based screening exercise regardless of the complexity 
of the process the engine describes. 

Although the database is defined without bias, and the 
workflow is highly computational and intends to 
reduce human interaction with the screening engine, 
the screening process achieves the best results with 
expert knowledge systems such as the BBN. In fact, 
the BBN offers a systematic approach to screen for 
technical and nontechnical (security, for example) 
aspects at the same time in cases in which perception 
and judgment are required for the ranking exercise. 

The benchmarking of this case study with traditional, 
deterministic methods reveals similar ranking results, 
but results are achieved more efficiently and faster 
using the stochastic method. The acceleration comes 
from both the probabilistic data handling and the 
screening engine. Rather than exact values, the 
confidence levels of data are enough to run database 
algorithms and compute the objective function for the 
screening exercise. Time-consuming preparations of a 
deterministic database are therefore not necessary. 
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